fix: Publisher-mode synchronization option for failover scenario#3222
fix: Publisher-mode synchronization option for failover scenario#3222
Conversation
📝 WalkthroughWalkthroughAdds a publisher-mode startup path for sync services during Raft failover, refactors syncer lifecycle and status semantics, adjusts Raft startup/election behavior, updates E2E P2P identity and process logging, enables local module replace directives, and updates docs/config and changelog. Changes
Sequence Diagram(s)sequenceDiagram
participant Failover as Failover Manager
participant Raft as Raft Node
participant Store as Block Store
participant Sync as Sync Service
participant P2P as P2P Init
Note over Failover,Raft: Startup Run()
Failover->>Raft: Is leader? / get raft config
alt Leader & persisted raft config present
Raft-->>Failover: leader=true
Failover->>Store: query header/data heights
Store-->>Failover: heights empty
Failover->>Sync: StartForPublishing(ctx)
Sync->>Sync: prepareStart (no initFromP2PWithRetry)
Sync->>P2P: startSubscriber
Sync-->>Failover: started (publishing)
else Not publishing
Failover->>Sync: Start(ctx)
Sync->>P2P: initFromP2PWithRetry (peer discovery)
Sync->>Sync: startSubscriber
Sync-->>Failover: started (normal)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @alpe's task in 2m 48s —— View job Code Review
Overall this is a well-motivated fix for a real race condition. The publisher-mode approach is a clean solution — skipping Critical / Bug
In if syncService.topicSubscription != nil {
syncService.topicSubscription.Cancel()
}Major
// current — storeHeight ignored
if headerHeight > 0 || dataHeight > 0 {
return false
}
// should be
if storeHeight > 0 || headerHeight > 0 || dataHeight > 0 {
return false
}
These relative paths are only valid inside this monorepo. Once Partial component leak on startup failure ( By the time an error in func (syncService *SyncService[H]) prepareStart(ctx context.Context) ([]peer.ID, error) {
peerIDs, err := syncService.setupP2PInfrastructure(ctx)
if err != nil {
return nil, ...
}
ok := false
defer func() {
if !ok { /* stop store, p2pServer, ex */ }
}()
// ... newSyncer ...
ok = true
return peerIDs, nil
}Minor
The follower startup path calls Genesis fallback in The fallback state only sets
The guide text now says Log file descriptor leak in E2E helper (
Nit / Positive Notes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3222 +/- ##
==========================================
+ Coverage 61.43% 61.65% +0.21%
==========================================
Files 120 120
Lines 12504 12588 +84
==========================================
+ Hits 7682 7761 +79
+ Misses 3960 3954 -6
- Partials 862 873 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
ec0ffc4 to
59dc917
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
test/e2e/sut_helper.go (1)
189-195:⚠️ Potential issue | 🟡 MinorClose the per-process logfile handle.
os.Createon Line 192 leaves an open fd that is never closed. E2E suites spawn a lot of processes, so this can leak descriptors and delay flushing the captured logs.Suggested change
logfile, err := os.Create(logfileName) require.NoError(s.t, err) + s.t.Cleanup(func() { _ = logfile.Close() }) errReader = io.NopCloser(io.TeeReader(errReader, logfile)) outReader = io.NopCloser(io.TeeReader(outReader, logfile))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/sut_helper.go` around lines 189 - 195, The per-process logfile created by os.Create (logfile) in the block that checks s.processLogDir() is never closed; wrap the logfile so it is closed when the returned readers are closed by replacing the io.NopCloser(io.TeeReader(..., logfile)) usage with a ReadCloser that closes the underlying logfile on Close, or otherwise ensure logfile.Close() is called (e.g., return a combined io.ReadCloser that delegates Read to the TeeReader and Close to logfile.Close) for both errReader and outReader so the file descriptor is released and buffered data flushed.block/internal/syncing/syncer.go (1)
1204-1213:⚠️ Potential issue | 🔴 CriticalBootstrap the same genesis state as
initializeState().This fallback only sets
ChainID,InitialHeight, andLastBlockHeight, but Line 808 later executes the recovered block againstcurrentState.AppHash. On a fresh node that means raft recovery runs against a different execution baseline than normal startup, which callsexec.InitChainand seedsAppHash,DAHeight, andLastBlockTime. Please reuse that bootstrap path here before applying the raft block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/syncing/syncer.go` around lines 1204 - 1213, The fallback after s.store.GetState should bootstrap the same full genesis state as initializeState() instead of only setting ChainID/InitialHeight/LastBlockHeight; update the recovery path that sets currentState and stateBootstrapped to call the same initialization logic (e.g., run the InitChain/bootstrap sequence used in initializeState) so that currentState.AppHash, DAHeight, LastBlockTime (and any other genesis-seeded fields) are populated before the raft/recovered block is executed; ensure you invoke the same executor/init routine (the InitChain or genesis bootstrap used by initializeState) rather than manually setting only those three fields.pkg/sync/sync_service.go (1)
190-205:⚠️ Potential issue | 🟠 MajorUnwind partially started components on startup errors.
By the time Line 233 returns from
setupP2PInfrastructure, the store, exchange server, and exchange can already be running. IfnewSyncerfails here — orstartSubscriberfails in the new publisher-mode path — these methods return without tearing those pieces down, so a failed start leaks live components into the process.As per coding guidelines, "Be mindful of goroutine leaks in Go code".
Also applies to: 219-225, 231-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sync/sync_service.go` around lines 190 - 205, The startup may leave store/exchange/exchange-server running if later steps fail (e.g. newSyncer or startSubscriber); update the startup sequence (functions prepareStart / setupP2PInfrastructure / initFromP2PWithRetry) to register and invoke cleanup on error: after setupP2PInfrastructure returns and before calling newSyncer or startSubscriber, capture the created components (store, exchangeServer, exchange or any returned tearDown/Close funcs) and ensure they are stopped when a subsequent error occurs (use a deferred cleanup or explicit teardown call on error paths in the block around prepareStart -> initFromP2PWithRetry -> newSyncer/startSubscriber). Reference symbols to change: prepareStart, setupP2PInfrastructure, initFromP2PWithRetry, newSyncer, and startSubscriber — ensure each path that returns an error calls the appropriate stop/Close/Shutdown methods for the started components to avoid goroutine/resource leaks.
🧹 Nitpick comments (1)
pkg/sync/sync_service_test.go (1)
63-97: Assert remote peer delivery, not just local initialization.This test pays the cost of bringing up
client2, but the only postcondition issvc.storeInitialized. A regression whereStartForPublishinginitializes locally but never broadcasts to peers would still pass. Please add an assertion on peer 2's view of the header so the new publisher-mode path is actually covered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sync/sync_service_test.go` around lines 63 - 97, The test currently only asserts svc.storeInitialized after broadcasting; add an assertion that the remote peer (client2) actually receives the broadcasted header: before calling svc.WriteToStoreAndBroadcast, register or subscribe a handler on client2 to capture incoming P2PSignedHeader messages (using client2's subscribe/handler API), then after WriteToStoreAndBroadcast use require.Eventually to poll/assert that the handler received a header equal to signedHeader (or matching height/DataHash/AppHash). Reference client1/client2, NewHeaderSyncService, StartForPublishing, WriteToStoreAndBroadcast and svc.storeInitialized when locating where to add the subscription and the eventual assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/evm/go.mod`:
- Around line 5-8: Remove the repo-local replace directives from the module
manifest (the replace block that points github.com/evstack/ev-node and
github.com/evstack/ev-node/execution/evm to relative ../../ paths) so the
published module no longer contains repository-specific local wiring; keep the
go.mod clean for downstream consumers and, if local development linking is
needed, use a go.work file or temporary local replace only in your working copy
before publishing.
In `@block/internal/syncing/syncer.go`:
- Around line 1230-1235: The retry uses the identical currentState so
errInvalidState from AssertValidForNextState will always recur; before retrying
the call to trySyncNextBlockWithState, refresh the snapshot/state (e.g. read
s.GetLastState() or obtain the latest snapshot used by AssertValidForNextState)
and pass that refreshed state into trySyncNextBlockWithState instead of reusing
the original currentState, or recompute/validate the next state after calling
s.SetLastState(currentState) so the second call sees the updated global state.
In `@CHANGELOG.md`:
- Around line 12-21: The changelog contains unresolved git conflict markers
(<<<<<<<, =======, >>>>>>>) — remove those markers and produce a single resolved
section (e.g., keep "### Added") combining both bullet entries into the final
list so both PRs stay: the publisher-mode synchronization line [`#3222`] and the
"Improve execution/evm check..." line [`#3221`]; ensure only one header remains
and that the PR links and bullets are intact and properly formatted.
In `@docs/guides/raft_production.md`:
- Around line 93-100: Update the example to use the canonical aggregator flag
and drop the legacy bootstrap flag: replace the --rollkit.node.aggregator=true
occurrence with --evnode.node.aggregator=true, and remove the
--evnode.raft.bootstrap=true line (only keep raft-related flags like
--evnode.raft.enable, --evnode.raft.node_id, --evnode.raft.raft_addr,
--evnode.raft.raft_dir, and --evnode.raft.peers). Ensure the rest of the example
(e.g., --rollkit.p2p.listen_address) stays unchanged unless also intended to be
canonicalized.
In `@node/failover.go`:
- Around line 193-209: The gating logic ignores storeHeight when deciding to
start in publisher mode; change the condition that currently checks headerHeight
and dataHeight to include storeHeight so publishing only starts when the main
block store and both sync stores are empty. In other words, adjust the boolean
check around headerSyncService.Store().Height() and
dataSyncService.Store().Height() (the if that returns false) to also consider
the local storeHeight variable so StartForPublishing is only taken when
storeHeight == 0 && headerHeight == 0 && dataHeight == 0.
In `@pkg/raft/election.go`:
- Around line 93-96: The wait here uses d.node.waitForMsgsLanded which
internally uses context.Background(), so follower startup isn’t cancellable;
update this call to use the Run cancellation context (e.g., pass the existing
ctx) and add a ctx-aware variant on the node (e.g., waitForMsgsLandedCtx(ctx,
timeout) or change waitForMsgsLanded to accept a context) so the call in
election.go uses that context and honors cancellation/role changes; modify the
node implementation (pkg/raft/node.go – waitForMsgsLanded) to accept and
propagate the provided context instead of creating context.Background().
---
Outside diff comments:
In `@block/internal/syncing/syncer.go`:
- Around line 1204-1213: The fallback after s.store.GetState should bootstrap
the same full genesis state as initializeState() instead of only setting
ChainID/InitialHeight/LastBlockHeight; update the recovery path that sets
currentState and stateBootstrapped to call the same initialization logic (e.g.,
run the InitChain/bootstrap sequence used in initializeState) so that
currentState.AppHash, DAHeight, LastBlockTime (and any other genesis-seeded
fields) are populated before the raft/recovered block is executed; ensure you
invoke the same executor/init routine (the InitChain or genesis bootstrap used
by initializeState) rather than manually setting only those three fields.
In `@pkg/sync/sync_service.go`:
- Around line 190-205: The startup may leave store/exchange/exchange-server
running if later steps fail (e.g. newSyncer or startSubscriber); update the
startup sequence (functions prepareStart / setupP2PInfrastructure /
initFromP2PWithRetry) to register and invoke cleanup on error: after
setupP2PInfrastructure returns and before calling newSyncer or startSubscriber,
capture the created components (store, exchangeServer, exchange or any returned
tearDown/Close funcs) and ensure they are stopped when a subsequent error occurs
(use a deferred cleanup or explicit teardown call on error paths in the block
around prepareStart -> initFromP2PWithRetry -> newSyncer/startSubscriber).
Reference symbols to change: prepareStart, setupP2PInfrastructure,
initFromP2PWithRetry, newSyncer, and startSubscriber — ensure each path that
returns an error calls the appropriate stop/Close/Shutdown methods for the
started components to avoid goroutine/resource leaks.
In `@test/e2e/sut_helper.go`:
- Around line 189-195: The per-process logfile created by os.Create (logfile) in
the block that checks s.processLogDir() is never closed; wrap the logfile so it
is closed when the returned readers are closed by replacing the
io.NopCloser(io.TeeReader(..., logfile)) usage with a ReadCloser that closes the
underlying logfile on Close, or otherwise ensure logfile.Close() is called
(e.g., return a combined io.ReadCloser that delegates Read to the TeeReader and
Close to logfile.Close) for both errReader and outReader so the file descriptor
is released and buffered data flushed.
---
Nitpick comments:
In `@pkg/sync/sync_service_test.go`:
- Around line 63-97: The test currently only asserts svc.storeInitialized after
broadcasting; add an assertion that the remote peer (client2) actually receives
the broadcasted header: before calling svc.WriteToStoreAndBroadcast, register or
subscribe a handler on client2 to capture incoming P2PSignedHeader messages
(using client2's subscribe/handler API), then after WriteToStoreAndBroadcast use
require.Eventually to poll/assert that the handler received a header equal to
signedHeader (or matching height/DataHash/AppHash). Reference client1/client2,
NewHeaderSyncService, StartForPublishing, WriteToStoreAndBroadcast and
svc.storeInitialized when locating where to add the subscription and the
eventual assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b5476c3-9ec8-4d67-a4cb-fcde2300edbd
⛔ Files ignored due to path filters (1)
apps/evm/go.sumis excluded by!**/*.sum
📒 Files selected for processing (19)
CHANGELOG.mdapps/evm/go.modblock/internal/syncing/syncer.goblock/internal/syncing/syncer_test.godocs/guides/raft_production.mddocs/learn/config.mdnode/failover.gopkg/config/config.gopkg/config/config_test.gopkg/raft/election.gopkg/raft/election_test.gopkg/raft/node.gopkg/raft/node_test.gopkg/sync/sync_service.gopkg/sync/sync_service_test.gopkg/sync/syncer_status.gopkg/sync/syncer_status_test.gotest/e2e/failover_e2e_test.gotest/e2e/sut_helper.go
| replace ( | ||
| github.com/evstack/ev-node => ../../ | ||
| github.com/evstack/ev-node/execution/evm => ../../execution/evm | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check for go.work file (monorepo pattern)
fd -H 'go.work' .
# Explore repository structure
ls -la apps/ 2>/dev/null || echo "apps directory structure"
find . -name 'go.mod' -type f | head -20Repository: evstack/ev-node
Length of output: 631
🏁 Script executed:
# Check the complete go.mod for apps/evm
cat -n apps/evm/go.modRepository: evstack/ev-node
Length of output: 14568
🏁 Script executed:
# Look for CI/CD configuration to understand build patterns
find . -name '.github' -type d
find . -type f \( -name '*.yml' -o -name '*.yaml' \) -path '*/.github/workflows/*' | head -10Repository: evstack/ev-node
Length of output: 434
🏁 Script executed:
# Check if there are any docs or comments about module publishing
grep -r "apps/evm" --include='*.md' --include='*.txt' . 2>/dev/null | head -10Repository: evstack/ev-node
Length of output: 579
Remove repo-local replace directives from the published module manifest.
These relative paths only resolve within this repository layout and break downstream builds when github.com/evstack/ev-node/apps/evm is used as a dependency. The module is published with independent versioning (per RELEASE.md); local wiring should use the existing go.work pattern or be removed before merge.
Suggested adjustment
-replace (
- github.com/evstack/ev-node => ../../
- github.com/evstack/ev-node/execution/evm => ../../execution/evm
-)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| replace ( | |
| github.com/evstack/ev-node => ../../ | |
| github.com/evstack/ev-node/execution/evm => ../../execution/evm | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/evm/go.mod` around lines 5 - 8, Remove the repo-local replace directives
from the module manifest (the replace block that points
github.com/evstack/ev-node and github.com/evstack/ev-node/execution/evm to
relative ../../ paths) so the published module no longer contains
repository-specific local wiring; keep the go.mod clean for downstream consumers
and, if local development linking is needed, use a go.work file or temporary
local replace only in your working copy before publishing.
block/internal/syncing/syncer.go
Outdated
| err := s.trySyncNextBlockWithState(ctx, event, currentState) | ||
| if err != nil && stateBootstrapped && errors.Is(err, errInvalidState) { | ||
| s.logger.Debug().Err(err).Msg("raft recovery failed after bootstrap state init, retrying once") | ||
| // Keep strict validation semantics; this retry only guards startup ordering races. | ||
| s.SetLastState(currentState) | ||
| err = s.trySyncNextBlockWithState(ctx, event, currentState) |
There was a problem hiding this comment.
The retry cannot clear errInvalidState.
Lines 719-727 validate against the currentState argument, and Line 1235 passes the same snapshot again. s.SetLastState(currentState) only updates the global pointer, so an errInvalidState from AssertValidForNextState will reproduce deterministically on the retry.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@block/internal/syncing/syncer.go` around lines 1230 - 1235, The retry uses
the identical currentState so errInvalidState from AssertValidForNextState will
always recur; before retrying the call to trySyncNextBlockWithState, refresh the
snapshot/state (e.g. read s.GetLastState() or obtain the latest snapshot used by
AssertValidForNextState) and pass that refreshed state into
trySyncNextBlockWithState instead of reusing the original currentState, or
recompute/validate the next state after calling s.SetLastState(currentState) so
the second call sees the updated global state.
| if err = d.node.waitForMsgsLanded(d.node.Config().SendTimeout); err != nil { | ||
| // this wait can legitimately time out | ||
| d.logger.Debug().Err(err).Msg("timed out waiting for raft messages before follower verification; continuing") | ||
| } |
There was a problem hiding this comment.
Keep follower startup cancellable.
This new wait path ignores Run cancellation because pkg/raft/node.go:145-175 implements waitForMsgsLanded with its own context.Background() timeout. On shutdown or role changes, follower startup can now sit here for up to SendTimeout. A ctx-aware variant would keep this path responsive. As per coding guidelines, "Use context.Context for cancellation in Go".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/raft/election.go` around lines 93 - 96, The wait here uses
d.node.waitForMsgsLanded which internally uses context.Background(), so follower
startup isn’t cancellable; update this call to use the Run cancellation context
(e.g., pass the existing ctx) and add a ctx-aware variant on the node (e.g.,
waitForMsgsLandedCtx(ctx, timeout) or change waitForMsgsLanded to accept a
context) so the call in election.go uses that context and honors
cancellation/role changes; modify the node implementation (pkg/raft/node.go –
waitForMsgsLanded) to accept and propagate the provided context instead of
creating context.Background().
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/guides/raft_production.md (1)
93-100:⚠️ Potential issue | 🟡 MinorRemove the bootstrap flag from the example command.
The documentation now describes
--evnode.raft.bootstrapas a compatibility-only flag (line 36) and states that "startup mode is selected automatically from persisted raft configuration state" (lines 60-62). However, the example still includes--evnode.raft.bootstrap=trueon line 98, making the configuration appear more complex than necessary and contradicting the "optional" guidance.For the primary/happy-path example, omit the bootstrap flag to demonstrate the recommended automatic mode detection.
✂️ Proposed fix
--evnode.raft.enable=true \ --evnode.raft.node_id="node-1" \ --evnode.raft.raft_addr="0.0.0.0:5001" \ --evnode.raft.raft_dir="/var/lib/ev-node/raft" \ - --evnode.raft.bootstrap=true \ --evnode.raft.peers="node-1@10.0.1.1:5001,node-2@10.0.1.2:5001,node-3@10.0.1.3:5001" \ --evnode.p2p.listen_address="/ip4/0.0.0.0/tcp/26656" \As per coding guidelines,
docs/guides/**/*.md: Structure guides with clear step-by-step instructions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/raft_production.md` around lines 93 - 100, The example command still includes the compatibility-only flag --evnode.raft.bootstrap=true which contradicts the guide text that startup mode is selected automatically; remove the --evnode.raft.bootstrap option from the example lines (the stanza containing --evnode.raft.enable, --evnode.raft.node_id, --evnode.raft.raft_addr, --evnode.raft.raft_dir, --evnode.raft.peers, and --evnode.p2p.listen_address) so the primary/happy-path example demonstrates automatic mode detection without the bootstrap flag.
🧹 Nitpick comments (2)
docs/guides/raft_production.md (2)
58-67: Fix markdown formatting issues.Static analysis flagged several markdown formatting violations in this section:
- Line 58: Missing blank line above the heading
- Lines 60-66: Inconsistent spacing after list markers (3 spaces instead of 1)
These don't affect the rendered output in most viewers but should be fixed for consistency and linter compliance.
🔧 Proposed formatting fixes
### 1. Static Peering & Automatic Startup Mode Use static peering with automatic mode selection from local raft configuration: -* If local raft configuration already exists in `--evnode.raft.raft_dir`, the node starts in rejoin mode. -* If no local raft configuration exists yet, the node bootstraps from configured peers. -* `--evnode.raft.bootstrap` is retained for compatibility but does not control mode selection. -* **All configured cluster members** should list the full set of peers in `--evnode.raft.peers`. +* If local raft configuration already exists in `--evnode.raft.raft_dir`, the node starts in rejoin mode. +* If no local raft configuration exists yet, the node bootstraps from configured peers. +* `--evnode.raft.bootstrap` is retained for compatibility but does not control mode selection. +* **All configured cluster members** should list the full set of peers in `--evnode.raft.peers`. * The `peers` list format is strict: `NodeID@Host:Port`. -* **Limitation**: Dynamic addition of peers (run-time membership changes) via RPC/CLI is not currently exposed. -* **Not supported**: Joining an existing cluster as a brand-new node that was not part of the initial static membership. +* **Limitation**: Dynamic addition of peers (run-time membership changes) via RPC/CLI is not currently exposed. +* **Not supported**: Joining an existing cluster as a brand-new node that was not part of the initial static membership.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/raft_production.md` around lines 58 - 67, Add a blank line before the "### 1. Static Peering & Automatic Startup Mode" heading and normalize the list bullets to use a single space after each marker and consistent single-space indentation; update the list items referencing flags like --evnode.raft.raft_dir, --evnode.raft.bootstrap, --evnode.raft.peers and the peers format `NodeID@Host:Port` so each bullet begins with "- " (dash + one space) and remove extra spaces after markers to satisfy markdown linter expectations.
36-36: Clarify what "compatibility flag" means.The description states the bootstrap flag is for "compatibility" but doesn't explain with what or whether it has any effect. Consider being explicit: does the flag still influence behavior, or is it completely ignored? If it's retained only for backward config-file compatibility (parsed but unused), state that clearly.
📝 Suggested clarification
-| `--evnode.raft.bootstrap` | `raft.bootstrap` | Compatibility flag. Startup mode is selected automatically from persisted raft configuration state. | optional | +| `--evnode.raft.bootstrap` | `raft.bootstrap` | Legacy flag (parsed but unused). Startup mode is selected automatically from persisted raft configuration state. | omit |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/raft_production.md` at line 36, The table entry for `--evnode.raft.bootstrap` / `raft.bootstrap` is vague about "compatibility flag"; update the description to explicitly state its current behavior (e.g., if it is retained only for backward compatibility and parsed but ignored at runtime, change the text to "Deprecated compatibility flag — parsed for backward compatibility but ignored; startup mode is selected automatically from persisted Raft configuration state"; if it still affects startup behavior, state that and describe the exact effect). Make the change to the table cell containing `--evnode.raft.bootstrap` / `raft.bootstrap` so readers know whether the flag is effective or merely retained for legacy config parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/guides/raft_production.md`:
- Around line 93-100: The example command still includes the compatibility-only
flag --evnode.raft.bootstrap=true which contradicts the guide text that startup
mode is selected automatically; remove the --evnode.raft.bootstrap option from
the example lines (the stanza containing --evnode.raft.enable,
--evnode.raft.node_id, --evnode.raft.raft_addr, --evnode.raft.raft_dir,
--evnode.raft.peers, and --evnode.p2p.listen_address) so the primary/happy-path
example demonstrates automatic mode detection without the bootstrap flag.
---
Nitpick comments:
In `@docs/guides/raft_production.md`:
- Around line 58-67: Add a blank line before the "### 1. Static Peering &
Automatic Startup Mode" heading and normalize the list bullets to use a single
space after each marker and consistent single-space indentation; update the list
items referencing flags like --evnode.raft.raft_dir, --evnode.raft.bootstrap,
--evnode.raft.peers and the peers format `NodeID@Host:Port` so each bullet
begins with "- " (dash + one space) and remove extra spaces after markers to
satisfy markdown linter expectations.
- Line 36: The table entry for `--evnode.raft.bootstrap` / `raft.bootstrap` is
vague about "compatibility flag"; update the description to explicitly state its
current behavior (e.g., if it is retained only for backward compatibility and
parsed but ignored at runtime, change the text to "Deprecated compatibility flag
— parsed for backward compatibility but ignored; startup mode is selected
automatically from persisted Raft configuration state"; if it still affects
startup behavior, state that and describe the exact effect). Make the change to
the table cell containing `--evnode.raft.bootstrap` / `raft.bootstrap` so
readers know whether the flag is effective or merely retained for legacy config
parsing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b93e6a52-29c2-4bb1-a32e-10cacf79f60b
📒 Files selected for processing (1)
docs/guides/raft_production.md
| Source: "", | ||
| }) | ||
| } | ||
| err := s.trySyncNextBlockWithState(ctx, event, currentState) |
There was a problem hiding this comment.
this change is effectivelly to use the correct state when we override it at L1207 right?
There was a problem hiding this comment.
yes, before it was re-read in the method and could be empty
| func (syncService *SyncService[H]) startSyncer(ctx context.Context) error { | ||
| if syncService.syncerStatus.isStarted() { | ||
| return nil | ||
| // StartForPublishing starts the sync service in publisher mode. |
There was a problem hiding this comment.
Trying to wrap my head around this. Why is it exactly needed? A node can fetch block 1 over DA and p2p 🤔.
Or is it because the connection happens before DA fetching, effectively deadlocking it?
There was a problem hiding this comment.
There was a race that could lead to a deadlock:
- P2P init waits for block 1 from peers.
- Block 1 production waits for P2P init to finish.
Overview
E2E HA tests fail sometimes on a race when the leader is waiting for p2p sync complete on a fresh start.
Summary by CodeRabbit
New Features
Documentation
Improvements
Tests
Chores